Predicting the species abundance distribution using a model food web 
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A large number of models of the species abundance distribution (SAD) have been proposed, many 
of which are generically similar to the log-normal distribution, from which they are often indistin- 
guishable when describing a given data set. Ecological data sets are necessarily incomplete samples 
of an ecosystem, subject to statistical noise, and cannot readily be combined to yield a closer ap- 
proximation to the underlying distribution. In this paper we use empirical data obtained from an 
ecosystem model to study the predicted SAD in detail, resolving features which can distinguish 
between models but which are poorly seen in field data. We find that the power-law normal dis- 
tribution is superior to both the log-normal and logit-normal distributions, and that the data can 
improve on even this at the high-population cut-off. 
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I. INTRODUCTION 



The species abundance distribution (SAD) is one of the 
most widely studied descriptions of an ecological commu- 
nity. To determine it, the number of species in a given 
community which have n observed individuals is plotted 
against n. The shape of this plot has been investigated 
by a great many empiricists and theorists over the years, 
beginning with the classic work of Fisher et al. []| and 
Preston [§]. Reviews of the subject [1, 0, H, H, 0| reveal 
the large number of mechanisms that have been proposed 
to explain the observed SAD. Many of these mechanisms 
predict the essential aspects of the observations, that is, 
a few very abundant species and many rare species. As 
a consequence it has become very difficult to falsify pro- 
posed mechanisms from empirical data, which has led to 
the authors of the most recent multi-author review [Q] to 
contrast the development of the analysis of SADs with 
"a healthy scientific field" in which "theoretical, empiri- 
cal and statistical developments [...] advance roughly in 
parallel" . 

In this paper we suggest a way forward which is in 
effect intermediate between the theoretical and empiri- 
cal approaches. We measure the SAD in an established 
model which constructs an ecological community as a 
set of predator-prey interactions [8j]. The model itself 
was originally created so that many of its key properties 
were emergent and not put in by hand. So, for instance, 
trophic levels emerge from the nature of the predator- 
prey interactions; species are not labelled as "plants", 
"herbivores" or "carnivores" a priori. This contrasts with 
traditional theoretical approaches which either postulate 
a mechanism, or if a model community is put forward it 
is usually rather simple, with the form of the SAD fol- 
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lowing from one of the fundamental aspects of the the- 
ory. Conversely, measurements taken in the field will of 
necessity include numerous influences involving climate, 
terrain, location etc., which are not present in the model 
we use to measure SADs. Thus the SADs we measure 
will be free of these external influences, but still be de- 
termined by influences which are too complex to easily 
characterise. This approach will also allow us to measure 
SADs for a multi-trophic level community whereas, so far 
as we are aware, all other predictions for SADs have been 
for communities of trophically similar species. 

The model we will be using (called the Webworld 
model) has been developed over a number of years 
d, H, [13, [ll|. In it, species are defined in terms of 
traits (phenotypic and behavioural characteristics), and 
it is the nature of the interactions between these traits 
which define the nature of interactions between species. 
This community is built up from a small number of 
species through a speciation mechanism which creates 
a new species with a novel set of features. Resources 
are distributed through a quite elaborate set of equa- 
tions governing population dynamics with adaptive for- 
aging. Overviews of the model are given in review articles 
Iisl . [T^ , and more briefly in section [Til In section 3 
we outline the method of our analysis, in section 4 we de- 
scribe the results obtained and we conclude with a review 
of the results in section 5. 



II. MODEL DESCRIPTION 

The Webworld model consists of a set of species, each 
defined by its unique combination of ten different fea- 
tures. The features are chosen from a set of L possi- 
ble features determined at the start of the simulation, at 
which point two species are created. One of these is the 
environment species, which has a fixed population for all 
time and is the ultimate source of energy for all species 
in the food web. The other initial species is the common 
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ancestor of all species encountered during a simulation 
run. 

The dynamics of the Webworld model occur on three 
separated time-scales. The longest of these is the evo- 
lutionary time-scale, on which new species are added as 
mutated versions of extant species. Specifically to the im- 
plementation of Webworld, a species is selected at ran- 
dom without regard to its population, except that this 
must be non-zero. One individual of that species is then 
used to found a new species identity, sharing all features 
but one with the parent species. The remaining feature 
is selected to avoid repetition either of the same feature 
within one species or the same set of features between 
species, but is otherwise selected at random. The newly 
introduced species is then subject to the same population 
dynamics as all other species, which is the dynamical pro- 
cess that occurs on the intermediate time-scale. 

Population dynamics occurs by balance of energy; en- 
ergy is gained through "predation" , which in the case of 
feeding on the environment species we interpret as au- 
totrophy. Each species i changes its population ac- 
cording to the balance equation 

hi ^ X'^gijHj -^gjiUi - drii, (1) 

j 3 

where gij is the functional response, the dependence of 
the rate of energy consumption by species i on the pop- 
ulation of species j. The factor of A = 0.1 introduces an 
ecological efficiency whereby the energy lost to species j 
is greater than that gained by its predator i. Thus the 
first term on the right hand side of Eq. ([1]) is the en- 
ergy income of species i summed across all prey species, 
while the second term is the energy loss summed across 
all predators. If species a does not feed on species b then 
dab = 0, and hence this does not contribute to either 
sum. The final term in Eq. ([1]) is the loss of energy from 
species i due to death of its constituent individuals at 
rate d per individual; the expected lifespan of an individ- 
ual is therefore l/d, which for simplicity we take to be 
the same across all species. Death appears in our model 
purely as an energy loss term and cannot be made an 
evolvable quantity, since it has a preferred value of zero. 

The shortest time-scale in the Webworld model refiects 
the choice of foraging strategy by individuals of each 
species. The functional response for Eq. ^ is given by 



where fij is the fraction of its time species i spends feed- 
ing on prey species j, which is the quantity to be op- 
timised in order to maximise 'Ylij9ij^j- ^ij ^-iid are 
constants defined by relating the features of species i and 
j, S indicating the ability of i to feed on j, and a relating 
to the degree of inter-specific competition. To prohibit 
mutual predation the matrix 5* is made anti-symmetric, 
thus Sij = —Sji, and the shortest possible feeding loop 
involves three species. Matrix a is symmetric, with max- 
imum competition an = 1 between members of the same 



species, and minimum competition 0.5 between highly- 
diff'erent species. By calculating S and a based on a set 
of features largely conserved during speciation we ensure 
that each newly-founded species has similar abilities to 
its parent species, with which it is also in strong com- 
petition, and in particular the dynamics of two identical 
species, were they allowed, would be indistinguishable 
from the dynamics of pooling them as one species. 

In Drossel et al. [8] an evolutionarily stable strategy 
was shown to exist for foraging, which can be found by 
iteratively solving Eq. ^ with the condition that 

k = (3) 

Z^fc 9ik 

The result of the repeated application of these dynam- 
ics is the gradual construction of a complex food web. 
Species are removed if their population falls below 1, and 
the fixed population of the environment species, i?, as 
such determines the expected number of species present 
in the food web at any time, though there is a contin- 
ual turnover of species and consequent fluctuation in any 
given food web measure. After running the model for 
a large number of evolutionary time steps, there is no 
systematic change in quantities such as the number of 
concurrent species, and the food web structure appears 
to have matured. It is on such webs that we examine the 
species abundance distribution. 



III. METHOD 

Using the Webworld model discussed in the previous 
section we generate sets of communities for which the 
ensemble species abundance distribution (SAD) can be 
examined in detail. Because we use the same set of pos- 
sible features and the same environment species in each 
case, we assume that the underlying SAD does not al- 
ter between model realisations. In this case it is possible 
to pool the resultant communities in order to determine 
the SAD with improved statistical noise. Details of the 
computational approach are given in section IIII Al In 
section UlIBI we discuss the functions which we fit to the 
data, and the optimisation criteria of the fitting. In sec- 
tion IIII CI we discuss the problems of generalising fits to 
include communities differing in size or trophic level. 



A. Comparative models 

Although the Webworld model can simulate ecological 
communities in reasonable time, to create large complex 
communities takes considerable computation, and to gen- 
erate enough simulations to get good statistics across a 
broad range of parameter space is difficult. We therefore 
perform the first examination on a variant of Webworld 
in which all species feed exclusively on the environment. 
Because all species are basal, the relative populations 
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are determined by the relative ability, S, and competi- 
tion, a, terms between existing species, which are se- 
lected by evolution in the same way as in the full model. 
By avoiding a large part of the computational effort we 
are able to generate large numbers of webs for compar- 
ison, and in the results presented here gather statistics 
from a set of one hundred model runs for each value of 
resources, R. In section IIVI we focus on the fitting of 
food webs with resources 10'^, 10*, 10^ and 10^, but sim- 
ulations were performed for numerous other values of R 
within this range to show that interpolation of the re- 
sults is reasonable. The minimum value of R results in 
communities with few species, which become correspond- 
ingly harder to characterise in terms of an SAD. Larger 
values of R become increasingly computationally expen- 
sive. Rather than attempting to extend the range of R to 
larger values, we created a total of 900 basal communities 
at i? = 10® for more detailed analysis of the tails of the 
distribution. Because the common theoretical SADs have 
been selected based on reproduction of the modal peak, 
and are poorly constrained by observations, the tails of- 
fer the largest differences between candidate SADs. Due 
to the much larger computational complexity of the full 
Webworld model, we have only a sample of ten compara- 
ble food webs for large R from which to deduce trophic 
SADs. 



B. Fitting method 

As can be seen in Figure [H the probability distribution 
function (p.d.f.) of species abundance has a rather noisy 
histogram even for the largest collection of independent 
communities we were able to assemble with the avail- 
able computer time. Fitting a distribution function to 
such histograms is problematic for several reasons. The 
noise makes it difficult to algorithmically optimise the 
fitting function, and hence can obscure differences in the 
strength of different functional forms. More importantly, 
the apparently optimal parameters and associated fitness 
will depend on the arbitrary choice of bin width and po- 
sition, since changing these parameters can significantly 
alter the distribution of noise between the bins. Further- 
more, the distribution function underlying the observed 
SAD is likely to have a functional form other than our 
approximations, and in general may be significantly more 
complicated than we can extract from data so long as the 
noise remains. Rather than obtaining a function which 
closely matches the value of the p.d.f. for most popula- 
tion sizes, but which omits important features, we prefer 
to recover a smoothed version of the distribution function 
which correctly predicts the total number of species. As a 
consequence of these considerations we fit the integrated 
version of the fitting function to the empirical cumula- 
tive distribution function (c.d.f.), whose value at a given 
population TV is the measured number of species with 
rii < N. This definition matches the type of p.d.f. used 
by whose integral is the expected number of species. 



P.d.f.s may also be defined such that the area enclosed 
is unity. To illustrate the fitting procedure we present 
plots of the measured and fitted c.d.f.s in addition to the 
p.d.f.s, and indicate the goodness-of-fit by plotting the 
residuals of the c.d.f., that is, the difference between the 
integrated fitting function and the measured c.d.f. 

The strongest condition that we impose on each fitting 
function is that it should correctly predict the number of 
species more abundant than the least abundant species 
observed. Below this population the distribution may be 
terminated by a veil line, but we do not allow any such 
consideration for populations above the most abundant 
species observed. Subject to this condition we optimise 
the parameters of each theoretical distribution function, 
/(IniV), by minimising a quantity analogous to x^- One 
such statistic is the Cramer- von Mises test [191, defined 



as 



1-0.5- fin,) 



(4) 



where f{ni) is the predicted number of species less abun- 
dant than Hi, subject to the veil line at ni, and S is 
the number of species observed. Although this is readily 
generalised to an ensemble of SADs, it attributes most 
weight to the peak of the distribution at the expense of 
fitting the tails, and we instead minimise the quantity 
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C{N)-f{N)) dlnN, 



(5) 



where C{N) is the observed number of species less abun- 
dant than N. For many distributions A'max oo, 
but functions such as the logit-normal distribution are 
parametrised by the total number of individuals ob- 
served, J, in which case iVmax = J- Unlike the Cramer- 
von Mises statistic, places equal weight in all inter- 
vals of IniV. Given that the theoretical distribution al- 
most certainly differs from the distribution underlying 
the data, this tends to avoid problematic regions, such as 
ranges of TV in which few species are observed, but where 
the empirical and theoretical c.d.f.s differ. The tails of 
the distribution often behave in this manner. Having 
identified optimal fitting parameters by minimising k^, 
we follow the advice of [6|] that "claim [s] of a superior fit 
must be robust by being superior on multiple measures" 
by evaluating the Kolmogorov-Smirnov statistic for 
each theoretical distribution. Defined for a single reali- 
sation as 

d = 5-i/Vax, {|z - 1 - /(n,)| , \i - , (6) 

d corresponds to the greatest deviation between the em- 
pirical and theoretical c.d.f.s. This must occur at one of 
the observed species, which correspond to steps in the 
empirical c.d.f. It is necessary to evaluate the differ- 
ence between the empirical and theoretical c.d.f. both 
immediately before and after the step, and hence the 
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FIG. 1: The fitted species abundance distribution for basal 
communities witfi resources R = 1000, 10 000, 100 000 and 
1 000 000. Tiie histogram indicates the data in bins of width 
0.1 in InA'^. The solid curves indicate optimal log-normal 
fits, the dotted lines optimal logit-normal fits, and the dashed 
lines optimal power-law normal fits. Distributions to the right 
correspond to increasing R. 
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FIG. 2: The fitted cumulative species abundance distribu- 
tion for basal communities with resources R = 1000, 10 000, 
100 000 and 1 000 000. The solid line shows the data, the dot- 
ted line marks the log-normal distribution, and the dashed 
line the power-law normal distribution. Distributions to the 
right correspond to increasing R. 



'maximum' operator in Eq. ([6]) contains two terms for 
each observation i. Although the values of d obtained 
imply rejection of the theoretical distributions given the 
amount of data available, we use d as a measure of the 
relative goodness-of-fit to distinguish between theoretical 
distributions. Other measures of goodness-of-fit tend to 
relate to binned data rather than the c.d.f., and provide 
correspondingly weaker evidence (171] . 

Although the log-normal distribution has been criti- 
cised as inappropriate for application to SADs it is 
a commonly examined form of the SAD and we there- 
fore adopt it as one of the theoretical SADs we fit to 
the data. We also consider the logit-normal distribu- 
tion preferred by Williamson & Gaston 18]. Whereas 
the log-normal distribution appears as a normal distribu- 
tion when plotted against a logarithmic population-axis, 
the logit-normal has a normal distribution when plotted 
against a logit population axis. Our analysis will consis- 
tently use the logarithmic axis both for plotting and for 
the integration of /c^, so while the log- normal distribution 
has the form 



P(lniV)dln7V = Aexp<^ - 



(IniV-ln^)' 
2^2 



dlniV, (7) 



with the fitting parameters A, /i and a, the logit-normal 
distribution includes an extra factor, giving 



P(lniV) = A 



J-N 



exp 



In 



J-N -^^^ J-fi 



2a2 



(8) 

We also consider a third fitting function, the power- 
law normal distribution, which appears normal against 
a power-law population axis. Transformed to a logarith- 



mic axis, this has the functional form 



P(lniV) = AaN'' exp | I , (g) 



where a is the power-law index. We do not consider 
the log-series distribution since our data are with few 
exceptions peaked at large N, whereas the p.d.f. of the 
log-series distribution decreases from = 1 even when 
drawn against a logarithmic population-axis. The broken 
stick distribution [19] was found to be similar in form to 
the observed distribution, but inferior to the log-normal 
in all cases. 



C. Comparison of food webs 

Since we are applying the same distribution function 
with different parameters to basal food webs of different 
sizes, and to the SADs of different trophic levels within 
a single community, in the ideal case a parametrisation 
of the fitting coefficients in terms of resources, R, and 
trophic level, I, would be found. Because small values of 
R correspond to food webs with fewer species, complica- 
tions arise in weighting the contribution to goodness-of- 
fit from differently sized webs, and we do not in this paper 
attempt to simultaneously fit webs of different sizes. By 
examination of the best-fitting parameters for each web 
we can determine the dependence of parameters on R ex- 
cept in one case; the power-law index a of the power-law 
normal distribution. For most values of R the goodness- 
of-fit depends quite weakly on this parameter, and the 
optimal value of a is poorly constrained for any one web. 
Since we were unable to identify a systematic trend or 
strongly constrain the value of a, we chose a = 0.2 as a 
constant value consistent with the optimised parameters, 
and fixed this value for all results presented here. 



FIG. 3: The same data plotted in Figure[2]shown as residuals; 
the solid line corresponds to the empirical c.d.f. minus the 
log-normal distribution, the dotted line to the data minus 
the logit-normal distribution, and the dashed line to the data 
minus the power-law normal distribution. Offsets of -0.5, -f .5 
and -2.5 have been applied to data for resources R = 10 000, 
100 000 and 1 000 000 respectively. 



IV. RESULTS 

In section IIV Al we present the results of the fitting 
procedure for the basal communities. These should give 
the least complicated species abundance distributions 
(SADs), since all species feed on a single resource and 
are in direct competition with each other. In comparison, 
the trophic communities examined in section IIVBI feed 
on multiple food sources themselves distributed in abun- 
dance, and compete with different subsets of the other 
species. In section IIV CI we make use of the large num- 
ber of simulation runs which can be performed to make 
a detailed examination of the low- and high-population 
tails of the empirical distribution, and compare this to 
the behaviour of the fitted distributions. 



A. Basal community 

The results for this version of the model are the most 
complete in that one hundred simulations runs were ex- 
amined for each value of resources i?, and a large num- 
ber of values of the continuous parameter were exam- 
ined. In Figures [1] to [3] only four of these realisations 
are plotted, corresponding to R — 10'^, 10'', 10^ and 10^, 
which include the two most extreme values of R for which 
webs were calculated. The general features of the SAD 
for these four values are typical, as is the goodness of 
fit achieved by each of the three fitting functions exam- 
ined. It is clear from Figure [T] that the observed dis- 
tribution is left-skewed (has an over-abundance of rarer 
species), a characteristic absent from the log- normal dis- 
tribution. The logit-normal distribution does not have 
significantly improved skew over the log-normal distri- 
bution, since the most abundant species from any run 
has less than one quarter of the mean number of indi- 
viduals J, and the logit function is therefore well below 



FIG. 4: Parameters of the power-law normal fit to the basal 
community SAD for all values of resources examined. The 
solid line passes through points marking the mean population, 
fi in Eq. ([9}; the dashed line marks the standard deviation, 
a. Squares mark the Kolmogorov-Smirnov d value, and stars 
mark the quantity K described in the text. 



its asymptotic cut-off. Williamson & Gaston ^T§\ note 
that in this limit the logit-normal distribution approaches 
the log-normal. The power-law normal distribution much 
more closely captures the smaller high-A^ tail. The cor- 
responding cumulative distribution functions (c.d.f.s) are 
plotted in Figure [51 where the logit-normal distribution 
has been omitted for clarity. It can be seen, especially 
for R = 10^, that the log- normal distribution underes- 
timates the cumulative number of species in both tails, 
which corresponds to the skew of the p.d.f., and that 
even for one hundred realisations the empirical c.d.f. is 
far from smooth. More instructive than the c.d.f. are the 
residuals of this plot, that is, the difference between the 
instantaneous value of the empirical c.d.f. and the fitting 
function. These are shown in Figure [3] for all three fitting 
functions. The integral of the square of this plot is our 
goodness-of-fit measure k^, and the maximum deviation 
from zero is the Kolmogorov-Smirnov d-measure. Sub- 
stantial structure can be seen in the residuals, especially 
the central peak for each value of R when examining the 
power-law normal distribution, which most closely mim- 
ics the tails. Table U records the values of and the 
Kolmogorov-Smirnov d value for each fit, for fitting pa- 
rameters minimising k^. Basal communities are labelled 
by the value of resources, R, while trophic levels exam- 
ined in section lTV Bl are labelled according to the trophic 
level, I. For the basal food webs the power-law normal fit 
always outperforms both the logit-normal and log-normal 
distributions in terms of A:^, and is only in one case in- 
ferior to the logit-normal distribution as measured by d. 
A further comparison of the relative merits of the theo- 
retical distribution functions is given in section flV CI 

In Figure [?] we plot the dependence of the parame- 
ters of the power-law normal fit on R, as well as the two 
goodness-of-fit indicators used. The solid line, marking 
the population of the peak of the distribution, indicates 
the very near linearity of the value of the peak of the 
distribution with In R. The standard deviation of the 
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FIG. 5: Histograms mark the observed species abundance 
distribution for the four trophic levels found in the ten Web- 
world communities examined. Trophic levels two and four are 
marked by dotted and dashed lines respectively. Solid curves 
mark the optimal log-normal fits to each trophic level, and 
dashed lines the optimal power-law normal fits. 
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FIG. 6: The cumulative species abundance distribution for 
Webworld communities corresponding to the four observed 
trophic levels. Higher trophic levels are to the left of lower lev- 
els, having smaller typical populations. Optimal log-normal 
fits are marked by dotted lines, and optimal power-law normal 
fits by dashed lines. 



distribution increases more rapidly than linearly, as in- 
dicated by the dashed line. The value of increases 
with In R for two reasons. Firstly, it is measured on the 
full c.d.f. rather than the normalised distribution, and 
so tends to increase as the square of the expected num- 
ber of species, S. Secondly, because it is an integrated 
measure, it tends to increase with the width of the distri- 
bution, which we characterise by the standard deviation 
of the log- normal distribution, ctln ■ It is more appropri- 
ate to use this measure than the standard deviation of 
the power-law normal itself since the former corresponds 
naturally to the width along the logarithmic population 
axis. In Figure [3] we plot the quantity 



K = 



1000/fc2 



(10) 



which compensates for these effects, and includes a fac- 
tor of 1000 to scale it appropriately for that plot. It can 
be seen that intermediate values of R are the best fit- 
ted, as measured by either K or the Kolmogorov-Smirnov 
d, perhaps due to relatively small amounts of additional 
structure. 



B. Trophic levels 

Having established that the power-law normal distri- 
bution describes the SAD reasonably well for basal com- 
munities, we apply it to individual trophic levels of full 
Webworld communities to determine the relevant fitting 
parameters. Due to the small number of food webs avail- 
able, and the small number of species in each trophic level 
for any one web, it is inappropriate to seek deviations 
from this distribution with the data available, although 
we find that the power-law normal distribution is ade- 
quate, and superior in all cases to the log-normal distri- 
bution, having smaller values of both and d. As indi- 
cated by the values given in table HI the logit-normal dis- 



tribution marginally improves upon the power-law nor- 
mal distribution for trophic levels 1 and 3, but is signifi- 
cantly inferior to the power-law normal for trophic level 
2. For trophic level 1, the typical number of species ob- 
served per web in the data examined was only 5.9, the 
most abundant species being nearly half the total popu- 
lation of its trophic level. For trophic level 3 the lower 
tail of the distribution was truncated, and although here 
the logit-normal distribution performed better than the 
power-law normal, it is not clear that the logit-normal is 
able to adequately reproduce the whole SAD. Although 
four trophic levels were found in the empirical data, a 
very small number of species were found in trophic level 
4. It can be seen in Figure [S] that the distribution func- 
tion of this level is little more than the high-population 
tail of the distribution function, and no reliable results 
can be obtained by its analysis. 

For comprehension of the empirical distribution being 
fitted we reproduce, in Figure[Sl the cumulative distribu- 
tion function constructed from the simulation data along 
with the optimal log-normal and power-law normal fits. 
It can be seen clearly from this figure that the distribu- 
tion of the second trophic level, which has the largest 
number of species in total, is closest in form the those of 
the basal communities. The distribution of trophic level 
three, to its left, passes the veil line before a significant 
fraction of the low-population tail has been exposed, but 
is otherwise in good agreement with the basal community 
distributions. The lowest trophic level, however, seems 
relatively truncated, resulting in a much sharper cutoff 
at large N than is reproduced by either the log-normal or 
power-law normal distributions. The cause of this may 
relate to the presence of predators, who can be expected 
to preferentially target the most abundant prey, but ad- 
ditional data are required to investigate this hypothesis. 
The residuals of the c.d.f. fits are shown in Figure [71 it 
is possible that similar structure in these is present to 
that seen for the basal communities in Figure [31 but the 
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FIG. 7: The same data plotted in Figure|6]shown as residuals; 
the solid line corresponds to the empirical c.d.f. minus the 
log-normal distribution, the dotted line to the data minus 
the logit-normal distribution, and the dashed line to the data 
minus the power-law normal distribution. Offsets of -1.0, -2.0 
and -2.5 have been applied to data for trophic levels 2, 3 and 
4 respectively. No logit-normal fit was obtained for trophic 
level 4 due to the absence of a positive optimal mean. 



degree of noise is greater. 

In Figure [8] the mean and standard deviation of the 
power-law normal distribution are plotted as a function 
of trophic level. While the standard deviation appears to 
decline linearly with trophic level, the distribution mean 
may decrease more slowly. However, if the results for 
trophic level four are misleading due to the extremely 
high position of the veil line, and the distribution of basal 
species is possibly altered through predation as discussed, 
the reliability of these results is limited. The quantity 
K, defined in Eq. PH)) . is much better for trophic lev- 
els two and three than for either the basal or fourth 
trophic level, although only marginal improvements in 
the Kolmogorov-Smirnov d value can be seen. 



C. Distribution tails 

An advantage of examining computationally derived 
communities of species is that extremely large data sets 
can be constructed with relative ease, subject only to 
the availability of computer time. In addition, the Web- 
world model produces complete ecological communities, 
and the sampling effects associated with field data are 
avoidable. As such it is much more feasible to examine 
the form taken by the tails of the distribution function, 
which McGill et al. 6] note are subject to noisy data, but 
which often contain the main differences between theo- 
retical distributions. 

To construct a high-quality empirical SAD whose tails 
could be examined, nine hundred simulation runs were 
performed for the basal community with R — 10^. The 
low-population tail of this distribution is plotted in Fig- 
ure [HI where the logarithm of the binned species abun- 
dance has been taken to expose the tail. The fact that 
a linear regression to this data (not shown) produces a 



FIG. 8: Parameters of the power-law normal fit to the trophic 
community SAD for all values of resources examined. The 
solid line passes through points marking the mean value of A''. 
The lower dashed line marks the standard deviation, while the 
upper dashed line multiplies this quantity by 10 for clarity. 
Squares mark the Kolmogorov-Smirnov d value. Stars mark 
the quantity K defined in the text. 



good fit for In iV < 7 implies that in this regime a power- 
law fit. 



P(lniV) cx 7V°, 



(11) 



with a ~ 4/3, is obeyed. The power-law normal distribu- 
tion is able to reproduce this form reasonably well, while 
both the log-normal and logit-normal distributions sig- 
nificantly underestimate the number of species present. 

The distribution tail for large populations is shown in 
Figure 1101 Here bins have been chosen to be uniform 
in width in population, rather than uniform in IniV, in 
order to resolve the tail. The result is that a different 
version of the distribution is shown. 



P{N) AN = 



P(ln7V) 
N 



dTV, 



(12) 



which, when integrated with respect to N , gives the c.d.f. 
Note that in order to highlight the form of the decay, the 
population axis has been stretched to a power-law. The 
regression line, plotted as a dash-dot line, indicates that 
the high-population tail has the form 



P{N) diV cx exp <^ - 



N 
7140 



1.4116 



dN. 



(13) 



As can be seen in Figure [TOl this form of the decay de- 
clines more rapidly with N than any of the log-normal, 
logit-normal or power-law normal distributions exam- 
ined. 

Having established probable forms for the low- and 
high-population tails by regression to Figures [9] and [TOl 
we combine these into a distribution which has the min- 
imum value of the two tail-fitting functions for all N . In 
addition to the dashed line marking the empirical c.d.f., 
identical to that shown in Figure [H this fit is shown in 
Figure [TT] in two forms. The lower plot is the c.d.f. inte- 
grated from zero species at A'^ = 0, and the upper curve is 
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FIG. 9: The small population tail of the basal community 
p.d.f. for resources R = 10^. The p.d.f. is described in 
the text. The solid, dotted and dashed curves mark the log- 
normal, logit-normal and power-law normal fits to Figure [2] 
respectively. 



integrated down from the observed number of species so 
as to converge with the empirical distribution at large N. 
The fact that the latter curve is above the former indi- 
cates that the combined distribution underestimates the 
total number of species, implying that it under-predicts 
the p.d.f. near the peak, to which it was not fitted. Fig- 
ure [11] therefore also plots the residuals of the tail- fitting 
distribution as a histogram. There appear to be at least 
three peaks in the residuals, making it difficult to identify 
a plausible general form. Since we do not have unrelated 
basal food webs to examine, in particular to establish 
what parameters of the tail distributions are generic and 
whether the residuals show a common pattern, it is not 
appropriate to draw further conclusions about the central 
part of the distribution. We are also unable to ascribe 
a goodness-of-fit to the tail-based distribution due to its 
inability to reproduce the peak of the distribution. 



V. CONCLUSIONS 

We have investigated the form of the species abun- 
dance distribution empirically derived from simulation 
results of the Webworld food web model. This model was 
created to examine patterns of food web assembly, and 
the form of the species abundance distribution (SAD) 
was not a factor in its construction. Rather, the use of 
population dynamics to establish the success of particu- 
lar species and feeding strategies within the community 
lead naturally to variation in the abundance of species 
which appears similar to the SADs identified from real 
ecosystems. By investigating the empirical SAD from 
the simulations in the same manner as data from real 
ecosystems we are able to characterise not only the peak 
of the distribution, which is frequently observed to have a 
form similar to the log-normal distribution, but to exam- 
ine in detail parts of the distribution difficult to obtain 
data on from field studies. We agree with the conclusion 
of Williamson & Gaston [l^ that the logit-normal dis- 




10000 15000 20000 25000 30000 



FIG. 10: The high population tail of the basal community 
p.d.f. for resources R — 10®. The x-axis is linear in jv^'^^^®, 
which was found to be the power-law index minimising the 
of the regression line, but has been marked with corresponding 
values of A'' for clarity. The histogram marks the value of 
P{N) , the population density in bins of equal width in A'^. The 
solid, dotted and dashed curves mark the log-normal, logit- 
normal and power-law normal fits to Figure [2] respectively. 
The dash-dotted line indicates the best-fitting regression for 
N > 2000. 



tribution fits better, but with particular reference to the 
tails of the distribution find that the power-law normal 
distribution function is better still. In particular, the log- 
normal and logit-normal distributions predict that the 
number of species with population N falls more rapidly 
with decreasing N than we obtain from our simulation 
results, which the power-law normal distribution matches 
very well in this tail. 

The presence of structure in Figure [3] suggests that 
a more complicated function is needed to properly re- 
produce the observed SAD, but we have not been able 
to examine the reproducibility of this remaining struc- 
ture. All the food webs examined were created for the 
same set of possible features and the same environment 
species. To fully explore the results even for a single value 
of R would require the use of food webs constructed for 
'worlds' with different environment species and feature 
sets. In undertaking such a programme it would first be 
necessary to establish whether such parameters as the 
mean and variance of the fitted distribution changed, or 
more generally to construct the meta-distribution of a 
large number of Webworld 'worlds' and test, using the 
Kolmogorov-Smirnov d value, whether the empirical dis- 
tribution constructed from webs of a single family was 
consistent with the meta-distribution. 

We find that the power-law normal distribution iden- 
tified as well describing the SAD of a basal community is 
also successful in describing individual trophic levels of 
a food web. It is particularly descriptive of the second 
trophic level, which can be seen in Figures [5] and [6] to be 
the most completely realised by our empirical data. The 
higher trophic levels can also be expected to be well-fitted 
by the power-law normal distribution, although the trun- 
cation of the distribution at low populations results in the 
log-normal and logit-normal descriptions also being ad- 
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FIG. 11: The c.d.f. for the basal community with resources 
R = lO", as shown in Figure^ is shown as a dashed line. The 
solid lines mark the c.d.f. constructed from fits to the tails as 
described in the text. The histogram marks the residuals of 
the p.d.f. of this fit. 

equate. The empirical distribution of the lowest trophic 
level is more sharply truncated at high populations than 
seen for other communities, the reason for which would 
require substantial additional investigation. Unlike the 
case of examining basal communities at different values 
of R, only a small number of trophic levels are ever pos- 
sible, and hence the relation between them is harder to 
quantify. While it would be possible to construct meta- 
distributions from larger numbers of food webs, it is more 
feasible to first examine the agreement between the meta- 
distributions of basal communities and the constituent 
distributions. If there is good agreement, the agreement 
between the meta-distribution and the trophic distribu- 
tions should be examined. If not, then a very large num- 
ber of communities need to be evolved in the same envi- 
ronment in order to study the relation between trophic 
levels, potentially also examining the effect of different 
values of R. The main problem in investigating the SAD 
of numerically modelled ecosystems is the extensive com- 
puter time required to provide data. 



The SADs constructed for this paper are complete 
not only in the sense that they contain all individuals 
present in the sample area, but also in that they do not 
feature immigrant or transient species, which can con- 
tribute to the low-population tail without representing 
a viable population. While features such as immigra- 
tion from surrounding communities can easily be incor- 
porated into our model, as can finite population effects, 
their exclusion demonstrates the existence of an extensive 
low-population tail to the distribution even for a closed 
ecosystem. This contrasts with the proposal by Magur- 
ran [20] that the low-population tail is a log-series distri- 
bution of "occasional" species added to a core log-normal 
distribution. Although we do not agree with McGill [2l| 
that left-skew is purely an effect of sampling, it may be 
the case that the left-skew of incomplete samples does 
not reflect the underlying distribution. 

McGill et al. [Q] observe that most proposed SADs are 
similar to one another except in the tails, which is pre- 
cisely the region which field observations are least able 
to address due to paucity of data. This issue can be ad- 
dressed by the use of any model which can produce multi- 
ple independent realisations of its dynamics from which 
a composite SAD can be constructed, but this process 
can only be used to inform the analyses which should be 
performed on ecological data, since it is not known a pri- 
ori that any given model accurately reproduces the real 
SAD. A virtue of the Webworld model is that is produces 
a plausible SAD without any such consideration having 
been used during the model design, being based rather 
on plausible ecological rules. 
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TABLE I: Comparison of measures of goodness-of-fit for the 
log-normal, logit-normal and power-law normal distributions 
to basal community SADs and trophic SADs. 



Community Species 







S 


Log 


Logit 


Power- law 


R = 


10^ 


11.22 


0.0599 


0.0390 


0.0314 


R = 


10* 


18.15 


0.2333 


0.1571 


0.1175 


R = 


10'' 


25.42 


0.8820 


0.6296 


0.1676 


R = 


10"^ 


30.16 


1.5759 


1.1752 


0.4708 


I = 


1 


5.9 


0.1532 


0.0902 


0.0877 


I = 


2 


18.1 


0.7972 


0.5328 


0.1480 


I = 


3 


14.0 


0.2111 


0.1126 


0.1093 


Community Species 


Kolmogorov-Smirnov d 






S 


Log 


Logit 


Power-law 


R = 


10^ 


11.22 


1.0648 


1.1094 


0.9990 


R = 


10* 


18.15 


1.3244 


1.0409 


1.1660 


R = 


lO'^ 


25.42 


1.4826 


1.3669 


0.9919 


R = 


10® 


30.16 


2.0255 


1.7661 


1.6167 


I = 


1 


5.9 


1.8391 


1.5798 


1.5934 


/ = 


2 


18.1 


2.0051 


1.7753 


1.1430 


/ = 


3 


14.0 


1.8270 


1.4797 


1.5241 



